Subfusion ================= 逐元素计算两个输入数组的减法运算,支持三种模式:带缩放因子的减法、减法后应用 ReLU 激活、减法后应用 ReLU6 激活。 **模式1 - 带缩放因子的减法(subext):** .. math:: \text{output}_i = \text{input0}_i - \text{input1}_i \times \alpha **模式2 - 减法后应用 ReLU(subrelu):** .. math:: \text{output}_i = \max(0, \text{input0}_i - \text{input1}_i) **模式3 - 减法后应用 ReLU6(subrelu6):** .. math:: \text{output}_i = \min(\max(0, \text{input0}_i - \text{input1}_i), 6) 输入: - **input0** - 第一个输入数据地址。 - **input1** - 第二个输入数据地址。 - **alpha** - 缩放因子(仅 subext 模式需要),用于缩放 `input1`。 - **size** - 计算长度(对于复数类型,指复数的个数)。 - **core_mask** - 核掩码(仅共享存储版本需要)。 输出: - **output** - 计算结果地址,其大小与输入相同。 支持平台: ``FT78NE`` ``MT7004`` .. note:: - FT78NE 支持fp32, int8, int16, int32, fp64, cplx64, cplx128 - MT7004 支持fp16, fp32, int16, int32, cplx64 **共享存储版本:** **subext(带缩放因子):** .. c:function:: void hp_subext_s(half* input0, half* input1, half alpha, half* output, int size, int core_mask) .. c:function:: void fp_subext_s(float* input0, float* input1, float alpha, float* output, int size, int core_mask) .. c:function:: void dp_subext_s(double* input0, double* input1, double alpha, double* output, int size, int core_mask) .. c:function:: void c64_subext_s(float* input0, float* input1, float alpha, float* output, int size, int core_mask) .. c:function:: void c128_subext_s(double* input0, double* input1, double alpha, double* output, int size, int core_mask) **subrelu(减法+ReLU):** .. c:function:: void i8_subrelu_s(int8_t* input0, int8_t* input1, int8_t* output, int size, int core_mask) .. c:function:: void i16_subrelu_s(int16_t* input0, int16_t* input1, int16_t* output, int size, int core_mask) .. c:function:: void i32_subrelu_s(int32_t* input0, int32_t* input1, int32_t* output, int size, int core_mask) .. c:function:: void hp_subrelu_s(half* input0, half* input1, half* output, int size, int core_mask) .. c:function:: void fp_subrelu_s(float* input0, float* input1, float* output, int size, int core_mask) .. c:function:: void dp_subrelu_s(double* input0, double* input1, double* output, int size, int core_mask) .. c:function:: void c64_subrelu_s(float* input0, float* input1, float* output, int size, int core_mask) .. c:function:: void c128_subrelu_s(double* input0, double* input1, double* output, int size, int core_mask) **subrelu6(减法+ReLU6):** .. c:function:: void i8_subrelu6_s(int8_t* input0, int8_t* input1, int8_t* output, int size, int core_mask) .. c:function:: void i16_subrelu6_s(int16_t* input0, int16_t* input1, int16_t* output, int size, int core_mask) .. c:function:: void i32_subrelu6_s(int32_t* input0, int32_t* input1, int32_t* output, int size, int core_mask) .. c:function:: void hp_subrelu6_s(half* input0, half* input1, half* output, int size, int core_mask) .. c:function:: void fp_subrelu6_s(float* input0, float* input1, float* output, int size, int core_mask) .. c:function:: void dp_subrelu6_s(double* input0, double* input1, double* output, int size, int core_mask) .. c:function:: void c64_subrelu6_s(float* input0, float* input1, float* output, int size, int core_mask) .. c:function:: void c128_subrelu6_s(double* input0, double* input1, double* output, int size, int core_mask) **C调用示例(subext):** .. code-block:: c :linenos: :emphasize-lines: 12 //FT78NE示例 #include #include int main(int argc, char* argv[]) { float *input0 = (float *)0xA0000000; // 第一个输入在DDR空间 float *input1 = (float *)0xA1000000; // 第二个输入在DDR空间 float *output = (float *)0xB0000000; // output float alpha = 0.5f; // 缩放因子 int size = 1000; int core_mask = 0xff; fp_subext_s(input0, input1, alpha, output, size, core_mask); return 0; } **C调用示例(subrelu):** .. code-block:: c :linenos: :emphasize-lines: 11 //FT78NE示例 #include #include int main(int argc, char* argv[]) { float *input0 = (float *)0xA0000000; float *input1 = (float *)0xA1000000; float *output = (float *)0xB0000000; int size = 1000; int core_mask = 0xff; fp_subrelu_s(input0, input1, output, size, core_mask); return 0; } **C调用示例(subrelu6):** .. code-block:: c :linenos: :emphasize-lines: 11 //FT78NE示例 #include #include int main(int argc, char* argv[]) { float *input0 = (float *)0xA0000000; float *input1 = (float *)0xA1000000; float *output = (float *)0xB0000000; int size = 1000; int core_mask = 0xff; fp_subrelu6_s(input0, input1, output, size, core_mask); return 0; } **私有存储版本:** **subext(带缩放因子):** .. c:function:: void hp_subext_p(half* input0, half* input1, half alpha, half* output, int size) .. c:function:: void fp_subext_p(float* input0, float* input1, float alpha, float* output, int size) .. c:function:: void dp_subext_p(double* input0, double* input1, double alpha, double* output, int size) .. c:function:: void c64_subext_p(float* input0, float* input1, float alpha, float* output, int size) .. c:function:: void c128_subext_p(double* input0, double* input1, double alpha, double* output, int size) **subrelu(减法+ReLU):** .. c:function:: void i8_subrelu_p(int8_t* input0, int8_t* input1, int8_t* output, int size) .. c:function:: void i16_subrelu_p(int16_t* input0, int16_t* input1, int16_t* output, int size) .. c:function:: void i32_subrelu_p(int32_t* input0, int32_t* input1, int32_t* output, int size) .. c:function:: void hp_subrelu_p(half* input0, half* input1, half* output, int size) .. c:function:: void fp_subrelu_p(float* input0, float* input1, float* output, int size) .. c:function:: void dp_subrelu_p(double* input0, double* input1, double* output, int size) .. c:function:: void c64_subrelu_p(float* input0, float* input1, float* output, int size) .. c:function:: void c128_subrelu_p(double* input0, double* input1, double* output, int size) **subrelu6(减法+ReLU6):** .. c:function:: void i8_subrelu6_p(int8_t* input0, int8_t* input1, int8_t* output, int size) .. c:function:: void i16_subrelu6_p(int16_t* input0, int16_t* input1, int16_t* output, int size) .. c:function:: void i32_subrelu6_p(int32_t* input0, int32_t* input1, int32_t* output, int size) .. c:function:: void hp_subrelu6_p(half* input0, half* input1, half* output, int size) .. c:function:: void fp_subrelu6_p(float* input0, float* input1, float* output, int size) .. c:function:: void dp_subrelu6_p(double* input0, double* input1, double* output, int size) .. c:function:: void c64_subrelu6_p(float* input0, float* input1, float* output, int size) .. c:function:: void c128_subrelu6_p(double* input0, double* input1, double* output, int size) **C调用示例(私有存储版本):** .. code-block:: c :linenos: :emphasize-lines: 10 //FT78NE示例 #include #include int main(int argc, char* argv[]) { float *input0 = (float *)0x10000000; // 第一个输入在L2空间 float *input1 = (float *)0x10001000; // 第二个输入在L2空间 float *output = (float *)0x10002000; // output int size = 1000; fp_subrelu_p(input0, input1, output, size); return 0; }